234 research outputs found
Enhancing network embedding with implicit clustering
Network embedding aims at learning the low dimensional representation of nodes. These representations can be widely used for network mining tasks, such as link prediction, anomaly detection, and classification. Recently, a great deal of meaningful research work has been carried out on this emerging network analysis paradigm. The real- world network contains different size clusters because of the edges with different relationship types. These clusters also reflect some features of nodes, which can contribute to the optimization of the feature representation of nodes. However, existing network embedding methods do not distinguish these relationship types. In this paper, we propose an unsupervised network representation learning model that can encode edge relationship information. Firstly, an objective function is defined, which can learn the edge vectors by implicit clustering. Then, a biased random walk is designed to generate a series of node sequences, which are put into Skip-Gram to learn the low dimensional node representations. Extensive experiments are conducted on several network datasets. Compared with the state-of-art baselines, the proposed method is able to achieve favorable and stable results in multi-label classification and link prediction tasks
Empirical Comparison of Graph Embeddings for Trust-Based Collaborative Filtering
In this work, we study the utility of graph embeddings to generate latent
user representations for trust-based collaborative filtering. In a cold-start
setting, on three publicly available datasets, we evaluate approaches from four
method families: (i) factorization-based, (ii) random walk-based, (iii) deep
learning-based, and (iv) the Large-scale Information Network Embedding (LINE)
approach. We find that across the four families, random-walk-based approaches
consistently achieve the best accuracy. Besides, they result in highly novel
and diverse recommendations. Furthermore, our results show that the use of
graph embeddings in trust-based collaborative filtering significantly improves
user coverage.Comment: 10 pages, Accepted as a full paper on the 25th International
Symposium on Methodologies for Intelligent Systems (ISMIS'20
Estimating the intrinsic dimension of datasets by a minimal neighborhood information
Analyzing large volumes of high-dimensional data is an issue of fundamental importance in data science, molecular simulations and beyond. Several approaches work on the assumption that the important content of a dataset belongs to a manifold whose Intrinsic Dimension (ID) is much lower than the crude large number of coordinates. Such manifold is generally twisted and curved; in addition points on it will be non-uniformly distributed: two factors that make the identification of the ID and its exploitation really hard. Here we propose a new ID estimator using only the distance of the first and the second nearest neighbor of each point in the sample. This extreme minimality enables us to reduce the effects of curvature, of density variation, and the resulting computational cost. The ID estimator is theoretically exact in uniformly distributed datasets, and provides consistent measures in general. When used in combination with block analysis, it allows discriminating the relevant dimensions as a function of the block size. This allows estimating the ID even when the data lie on a manifold perturbed by a high-dimensional noise, a situation often encountered in real world data sets. We demonstrate the usefulness of the approach on molecular simulations and image analysis
Applications of Information Theory to Analysis of Neural Data
Information theory is a practical and theoretical framework developed for the
study of communication over noisy channels. Its probabilistic basis and
capacity to relate statistical structure to function make it ideally suited for
studying information flow in the nervous system. It has a number of useful
properties: it is a general measure sensitive to any relationship, not only
linear effects; it has meaningful units which in many cases allow direct
comparison between different experiments; and it can be used to study how much
information can be gained by observing neural responses in single trials,
rather than in averages over multiple trials. A variety of information
theoretic quantities are commonly used in neuroscience - (see entry
"Definitions of Information-Theoretic Quantities"). In this entry we review
some applications of information theory in neuroscience to study encoding of
information in both single neurons and neuronal populations.Comment: 8 pages, 2 figure
Improving Sparse Representation-Based Classification Using Local Principal Component Analysis
Sparse representation-based classification (SRC), proposed by Wright et al.,
seeks the sparsest decomposition of a test sample over the dictionary of
training samples, with classification to the most-contributing class. Because
it assumes test samples can be written as linear combinations of their
same-class training samples, the success of SRC depends on the size and
representativeness of the training set. Our proposed classification algorithm
enlarges the training set by using local principal component analysis to
approximate the basis vectors of the tangent hyperplane of the class manifold
at each training sample. The dictionary in SRC is replaced by a local
dictionary that adapts to the test sample and includes training samples and
their corresponding tangent basis vectors. We use a synthetic data set and
three face databases to demonstrate that this method can achieve higher
classification accuracy than SRC in cases of sparse sampling, nonlinear class
manifolds, and stringent dimension reduction.Comment: Published in "Computational Intelligence for Pattern Recognition,"
editors Shyi-Ming Chen and Witold Pedrycz. The original publication is
available at http://www.springerlink.co
Photometric stereo for 3D face reconstruction using non-linear illumination models
Face recognition in presence of illumination changes, variant pose and different facial expressions is a challenging problem. In this paper, a method for 3D face reconstruction using photometric stereo and without knowing the illumination directions and facial expression is proposed in order to achieve improvement in face recognition. A dimensionality reduction method was introduced to represent the face deformations due to illumination variations and self shadows in a lower space. The obtained mapping function was used to determine the illumination direction of each input image and that direction was used to apply photometric stereo. Experiments with faces were performed in order to evaluate the performance of the proposed scheme. From the experiments it was shown that the proposed approach results very accurate 3D surfaces without knowing the light directions and with a very small differences compared to the case of known directions. As a result the proposed approach is more general and requires less restrictions enabling 3D face recognition methods to operate with less data
Validation of nonlinear PCA
Linear principal component analysis (PCA) can be extended to a nonlinear PCA
by using artificial neural networks. But the benefit of curved components
requires a careful control of the model complexity. Moreover, standard
techniques for model selection, including cross-validation and more generally
the use of an independent test set, fail when applied to nonlinear PCA because
of its inherent unsupervised characteristics. This paper presents a new
approach for validating the complexity of nonlinear PCA models by using the
error in missing data estimation as a criterion for model selection. It is
motivated by the idea that only the model of optimal complexity is able to
predict missing values with the highest accuracy. While standard test set
validation usually favours over-fitted nonlinear PCA models, the proposed model
validation approach correctly selects the optimal model complexity.Comment: 12 pages, 5 figure
Non-Redundant Spectral Dimensionality Reduction
Spectral dimensionality reduction algorithms are widely used in numerous
domains, including for recognition, segmentation, tracking and visualization.
However, despite their popularity, these algorithms suffer from a major
limitation known as the "repeated Eigen-directions" phenomenon. That is, many
of the embedding coordinates they produce typically capture the same direction
along the data manifold. This leads to redundant and inefficient
representations that do not reveal the true intrinsic dimensionality of the
data. In this paper, we propose a general method for avoiding redundancy in
spectral algorithms. Our approach relies on replacing the orthogonality
constraints underlying those methods by unpredictability constraints.
Specifically, we require that each embedding coordinate be unpredictable (in
the statistical sense) from all previous ones. We prove that these constraints
necessarily prevent redundancy, and provide a simple technique to incorporate
them into existing methods. As we illustrate on challenging high-dimensional
scenarios, our approach produces significantly more informative and compact
representations, which improve visualization and classification tasks
Recommended from our members
On the challenges and opportunities in visualization for machine learning and knowledge extraction: A research agenda
We describe a selection of challenges at the intersection of machine learning and data visualization and outline a subjective research agenda based on professional and personal experience. The unprecedented increase in the amount, variety and the value of data has been significantly transforming the way that scientific research is carried out and businesses operate. Within data science, which has emerged as a practice to enable this data-intensive innovation by gathering together and advancing the knowledge from fields such as statistics, machine learning, knowledge extraction, data management, and visualization, visualization plays a unique and maybe the ultimate role as an approach to facilitate the human and computer cooperation, and to particularly enable the analysis of diverse and heterogeneous data using complex computational methods where algorithmic results are challenging to interpret and operationalize. Whilst algorithm development is surely at the center of the whole pipeline in disciplines such as Machine Learning and Knowledge Discovery, it is visualization which ultimately makes the results accessible to the end user. Visualization thus can be seen as a mapping from arbitrarily high-dimensional abstract spaces to the lower dimensions and plays a central and critical role in interacting with machine learning algorithms, and particularly in interactive machine learning (iML) with including the human-in-the-loop. The central goal of the CD-MAKE VIS workshop is to spark discussions at this intersection of visualization, machine learning and knowledge discovery and bring together experts from these disciplines. This paper discusses a perspective on the challenges and opportunities in this integration of these discipline and presents a number of directions and strategies for further research
Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction
It is difficult to find the optimal sparse solution of a manifold learning
based dimensionality reduction algorithm. The lasso or the elastic net
penalized manifold learning based dimensionality reduction is not directly a
lasso penalized least square problem and thus the least angle regression (LARS)
(Efron et al. \cite{LARS}), one of the most popular algorithms in sparse
learning, cannot be applied. Therefore, most current approaches take indirect
ways or have strict settings, which can be inconvenient for applications. In
this paper, we proposed the manifold elastic net or MEN for short. MEN
incorporates the merits of both the manifold learning based dimensionality
reduction and the sparse learning based dimensionality reduction. By using a
series of equivalent transformations, we show MEN is equivalent to the lasso
penalized least square problem and thus LARS is adopted to obtain the optimal
sparse solution of MEN. In particular, MEN has the following advantages for
subsequent classification: 1) the local geometry of samples is well preserved
for low dimensional data representation, 2) both the margin maximization and
the classification error minimization are considered for sparse projection
calculation, 3) the projection matrix of MEN improves the parsimony in
computation, 4) the elastic net penalty reduces the over-fitting problem, and
5) the projection matrix of MEN can be interpreted psychologically and
physiologically. Experimental evidence on face recognition over various popular
datasets suggests that MEN is superior to top level dimensionality reduction
algorithms.Comment: 33 pages, 12 figure
- …